SEB genotyping: SmartAmp-Eprimer binary code genotyping for complex, highly variable targets applied to HBV

Background SmartAmp-Eprimer Binary code (SEB) Genotyping is a novel isothermal amplification method for rapid genotyping of any variable target of interest. Methods After in silico alignment of a large number of sequences and computational analysis to determine the smallest number of regions to be targeted by SEB Genotyping, SmartAmp primer sets were designed to obtain a binary code of On/Off fluorescence signals, each code corresponding to a unique genotype. Results Applied to HBV, we selected 4 targets for which fluorescence amplification signals produce a specific binary code unique to each of the 8 main genotypes (A–H) found in patients worldwide. Conclusions We present here the proof of concept of a new genotyping method specifically designed for complex and highly variable targets. Applied here to HBV, SEB Genotyping can be adapted to any other pathogen or disease carrying multiple known mutations. Using simple preparation steps, SEB Genotyping provides accurate results quickly and will enable physicians to choose the best adapted treatment for each of their patients. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07458-4.


Background
Genetic variation between individuals within a population (viral, bacterial or even in humans) can be silent, lead to different phenotypes, diseases or death if it occurs in essential genes. Genetic variants can be found either at single nucleotide location or in more complex repartition schemes over a gene. Examples of viruses with complex genetic variants are the human papillomavirus (HPV) where out of over 170 closely related types, HPV-type 16 and 18 lead to over 60% of HPV-related cancers [1,2] or the hepatitis B virus (HBV). HBV is a leading cause of liver cancer, resulting in over 880,000 deaths per year [3]. Divided into 10 genotypes (A-J) of which 5 cause over 96% of infections worldwide [respectively genotype C (26%), D (22%), E (18%), A (17%) and B (14%)] [4], it has been extensively studied that HBV genotypes have an influence on evolution or prognosis of the liver diseases, risks of complications, and responses to treatment [5][6][7].
While sequencing is considered the gold standard for the detection of all genetic variations, its cost and preparation time are often cited as major limitation for its usage [8,9]. Isothermal amplification methods have been developed that are as sensitive as PCR while producing results much faster at low cost, making them easy and attractive tools for small point of care settings or developing regions, as recently mentioned in multiple reviews [10][11][12][13]. SmartAmp is an isothermal nucleic acid amplification method [14] that can be combined with sequence-specific probes such as Exciton-Controlled Hybridization-sensitive fluorescent Oligonucleotides (ECHOs) for genotyping [15][16][17]. Called "Exciton probe-Eprobe" [18,19] or "Exciton primer-Eprimer", ECHOs are oligonucleotides that only emit fluorescence upon binding to their sequence specific targets. So far, SmartAmp has been used for SNP genotyping (wild-type vs mutant) either with one [20] or duplex Eprimer labelling [21][22][23].
We hypothesized that combining Eprimer On/Off fluorescence signal detection into a specific, digitized or binary code would enable us to distinguish and identify complex genetic variants. To realize this concept, we applied our analysis results to HBV genotyping. Historically, most HBV genotyping methods based on nucleic acid amplification and detection have focused on the highly conserved pre-S/-S gene region to distinguish between different HBV genotypes and sub-genotypes [5]. Utilizing this region, we developed our new SmartAmp binary code genotyping primer sets for HBV. Here, we describe a new usage of SmartAmp-Eprimer, combining in silico alignment of a large number of sequences and computational analysis to determine the smallest number of regions targeted for amplification, leading to a binary code of On/Off fluorescence signals, each code corresponding to a unique genotype. We named this new technique "SmartAmp-Eprimer Binary code Genotyping (SEB Genotyping)".

HBV sequence alignments
Alignment of HBV sequences followed a protocol similar to the one described in [24] but revised by us for large data analysis. All the FASTA sequences for the S region of HBV sorted by genotype A to H available on the HBV database (HBVdb) [25] website were downloaded (https:// hbvdb. lyon. inserm. fr/ HBVdb/ HBVdb Datas et? seqty pe=0 accessed May 8th 2018). For each of the 8 genotypes, all the sequences available (shown in Fig. 1a) were aligned to create a consensus sequence using MAFFT on Jalview desktop software (Jalview 2.10.4) and their PIDs (percentage of identity between the multiple sequences) were exported [26]. These 8 consensus sequences were then aligned to create a pan-genotype consensus sequence and PID, using the aforementioned method. Results are shown in Additional file 1. Using the alignment of 8 consensus sequences instead of the global alignment of over 20,000 HBV sequences permitted us to reduce the bias against rare genotypes (Additional file 2).

Selection of the target positions for genotyping
For each position in the pan-genotype sequence (Fig. 1b), the PID values were analyzed to identify nucleotides that were over 90% conserved within one genotype but differed between genotypes: for example, at the position 87 genotype A is 96.43% conserved as G but genotype C is 98.08% conserved as A. Then the minimum combination of specific nucleotide positions that permitted discriminating between genotypes was manually selected. Because we were targeting a relatively small (681 bp) and highly conserved region, this was done manually using Excel. In our dataset, all 8 genotypes could be identified using 4 targeted nucleotide positions (Fig. 1c, PID per genotype > 90% or otherwise noted).

SmartAmp primer sets design
SmartAmp primer sets were designed as previously described [14,20] to amplify the regions surrounding the four selected target positions. Following the Fig. 1 Flowchart to determine mutation positions used for genotyping. a FASTA sequences for the S region of HBV were downloaded from HBVdb and aligned into a sequence. b For each position a mutation score was calculated and (c) targets were selected that are > 90% conserved within one genotype but different between different genotypes recommendations described by Kimura et al. [17,27], Eprimers were carefully designed so that (i) the ECHOlabeled thymidine (marked Z in Table 1) was not placed in the last two positions at the 5′ or 3′ end of the oligonucleotide, (ii) it was not surrounded by a mismatch within 2 bases either in the 5′ or 3′ direction, and (iii) the four targeted bases for genotyping (i.e., positions 87, 170, 203 and 390-392 of our pan-genotype consensus sequence) were placed as the penultimate 3′-end nucleotide of each Eprimer for On/Off effect. Eprimers were purchased from DNAFORM (Yokohama, Japan), and DNA oligonucleotides were purchased from Eurofins Genomics (Tokyo, Japan) or Sigma-Aldrich (Tokyo, Japan). Sequences for standard oligonucleotides and Eprimers (noted oBP) are listed in Table 1.

SmartAmp reaction
Each SmartAmp reaction mixture was carried in a total volume of 25 µl and contained 3. sample. Samples for genotyping (12.5 µl/well) were prepared by adding 0.6 µl of 1 M NaOH to 11.9 µl of 12,500 copies of purified plasmid or viral DNA per well. The samples were heat-denatured at 95 °C for 3 min and chilled 3 min at 4 °C before adding 12.5 µl of the reaction mixture. All reactions were performed on LightCycler 480 II (Roche Diagnostics, Mannheim, Germany). Amplification was run for 60 cycles of 1 min at 67 °C and fluorescence signals were detected during each cycle using a custom thiazole orange filter range (excitation: 498 nm, emission: 580 nm). The data was transferred to Microsoft Excel (Microsoft, Redmond, VA, USA) for plotting.

Plasmid template sequences
Eight different pEX-A2J2 vectors containing the 681 bplong full HBV S-region genotype-specific consensus sequences (A-H) were ordered from Eurofins Genomics. A map of these plasmids can be found in Additional file 3.

Sanger sequencing
All 9 human samples were genotyped and examined for the presence of mutations, as described previously [19]. PrimeSTAR (Takara Bio) PCR reaction was performed following the manufacturer's instructions and using the following primers: HBV_MAFFT.Bf.41-18, CCT AGG ACC CCT GCT CGT; and HBV_MAFFT.Br.601-16, ACA GAC TTG GCC CCCA. Thermal cycling conditions included preincubation at 95 °C for 30 s, followed by 50 cycles at 98 °C for 10 s, 60 °C for 5 s, and 72 °C for 36 s, and extension at 72 °C for 5 min. The PCR products were purified using the QIAquick PCR purification kit (Qiagen, Tokyo, Japan) and processed for DNA sequencing using ABI PRISM BigDye Terminator version 3.1 (Applied Biosystems, Waltham, MA, USA) with the same forward or reverse primer. Sequence data were generated using the ABI PRISM 3730 DNA Analyzer (Applied Biosystems). These sequences were compared to the consensus genotypes sequences using Clustal Omega [31] and their genotypes were assessed using HBVdb online tools [25].

Results
Using plasmid DNA carrying the full S-region consensus sequence for each of the 8 HBV genotypes, we tested our four genotyping primer sets targeting the positions 87, 170, 203 and 390-392 (Fig. 2a).
The expected results are a sigmoid amplification curve for a full-match between Eprimer and the template (signal On) or no amplification curve for a mismatch (signal Off ). Because Eprimers bind tightly to their sequencespecific targets and a single mutation can inhibit the emission of fluorescence, we determined that even one positive amplification signal out of multiple replicates should be read as positive or "On", and there should be no false positive, in theory. Conversely, the fluorescence signal emitted by ECHOs is very strong and can be detected even at low emission levels and there should be no false negative within the limit of detection of the assay. The combination of these On/Off signals for each of the 4 targeted positions provides a unique binary code permitting the specific genotyping of the sample tested. Comparing the amplification curve signals for each plasmid template (Fig. 2a) with the signal detection code (Table 2), we can see that each of the 8 samples were perfectly identical to the digitized On/Off pattern specific to the corresponding genotype.
Of note, the SmartAmp primer set for target 203 (and to a lesser extent target 87) sometimes emits a low nonspecific fluorescent signal. Given that the signal intensity and the amplification curves of these targets are quite different from the expected intensity and sigmoid curves, they are considered as background emissions and can easily be interpreted as noise rather than a positive signal ( Fig. 2a: genotype A target 203 and genotype H target 87). Alternatively, a signal intensity cutoff can be set for an easier interpretation (cutoff at 40 RFU for target 203 and at 3 RFU for target 87). Moreover, the melting curve analysis of such non-specific signals shows a different Tm value from the true-positive samples (Additional file5), confirming that the non-specific product is clearly different from the true-positive one and thus the signal should be read as negative.
After successful genotyping of plasmid DNA, we tested our technique on natural viral DNA. The HBV25 cell line is infected with HBV genotype C [28] and HepG2.2.15.7 with genotype D [29,30] and both cell lines release viral particles in the cell culture supernatant. The amplification curve patterns clearly follow the On/Off binary code specific to genotype C for HBP25 (1010) or genotype D for HepG2.2.15.7 (1111) (Fig. 2b). To confirm that our method was working in clinical samples, we then proceeded to test our assay on serum samples from patients chronically infected with HBV. Because the number of samples was limited and all carried the same genotype, our study cannot qualify as a clinical study and should be considered a proof of concept. Based on preliminary results on the limit of detection of our primer sets (Additional file 6), nine human serum samples (Fig. 2c) were tested at 1.25 × 10 5 copies/reaction. The positive control was our plasmid DNA genotype D at 1.25 × 10 4 copies/reaction. The amplification curve patterns for each sample were analyzed to determine their specific binary code and compared to the genotype given in the patient cards.
The samples were also subjected to Sanger sequencing and genotyped. The concordance between the patient cards (genotyped by PCR-Invader assay) and Sanger sequencing was 100% (9/9). Using SEB Genotyping, 7 out of 9 samples (78%) had concordant results with the EIA genotyping and Sanger sequencing. One sample (SHBV-109) was ambiguous, showing the pattern "0111" for genotype F. Sequence analysis identified it as genotype C with a T > C mutation in the target position 203 inducing this pattern. Sample SHBV-116 was genotyped as A by our method, as one of the two replicates for target position 87 showed a clear amplification signal, but as C by both the reference method and Sanger sequencing. Further post-amplification testing using melting curve showed the melting peak patterns were clearly different between this specific case and the positive control (Additional file 7) or other genotypes (Additional file 5). The amplification product obtained is not the same as for other samples and a melting curve analysis could distinctly identify a false positive.

Discussion
Recent international guidelines [32] for treatment of HBV patients recommend HBV genotyping before initiating pegylated interferon therapy, making it necessary to develop new, simple, convenient and accurate technologies for HBV genotyping even at community hospitals and clinics. Our new technique's one-well amplification and detection step reduces the risk of contamination compared to others (e.g. PCR invader assay), and only 4 regions need to be detected (15 for the oligonucleotides microarray [33]). A digital binary code for HBV detection was developed using lateral flow immunoassay [34], but it required an 8-digits binary code to distinguish between genotypes A-B-C and D, while our 4-digits binary code can distinguish 8 HBV genotypes from A to H.
We showed that our technology works well on plasmid DNA and on viral particles extracted from cell culture supernatant. One limitation in our experiments was the very small number (n = 9) and very low amount of patient serum samples (15 µl each) available, all of them being genotype C. Indeed, all samples are from Japanese patients with chronic HBV infection and in this population around 80% are infected with genotype C [35][36][37]. The number is also low because we tested human samples as an example to confirm whether the expected theoretical pattern was obtained with our method using actual samples rather than to extensively validate it in a clinical study. While 7 out of 9 samples did show the expected pattern, two samples had conflicting results. Long-term storage of low amount of sample as well as the DNA extraction step increase the risk of degradation and loss of template [38], which may have caused the discordant results found in sample SHBV-116 or a rare event of primer dimerization induced this signal, as this pattern was only seen in one replicate of one patient sample while all other patient samples as well as plasmids and infected cell lines showed the expected amplification pattern. While this specific sample is not representative of the assay, its false positive result highlights that our study needs further optimization and testing at a larger scale before aiming for the clinical setting. Another limitation of nucleic acid detection-based methods is the vulnerability of the signal to mutations. We have selected the 4 target positions to be at least 90% conserved within one genotype while different from other genotypes, or otherwise noted in Fig. 1c. Target position 203 is 86% conserved as T in genotype C but one of our samples (SHBV-109) was sequenced as carrying a minor variant having a T > C mutation in this position resulting in a miscalling as genotype F.While it is unfortunate to have a minor variant in such a low number of samples, the issue here does not lay with the SEB Genotyping method but with the careful selection of the target, pointing out that the target 203 may not be an adequate position for our purpose. It may be necessary for future users selecting their own SEB Genotyping targets to revise the cutoff to higher than 90% to avoid such discordant results. Regarding the low non-specific background signal that is sometimes observed, one could design Eprimers with the labelled base within 3 nucleotides of the targeted variant, anywhere on the Eprimer sequence and not only at the penultimate 3′-end nucleotide, interfering with the binding of the Eprimer to the viral DNA, preventing the quenched thiazole orange moieties from separating from each other and effectively inhibiting the emission of fluorescence in case of mismatch.
We had no information about the past treatment regimen of the 9 patient samples tested in this study or about their drug resistance status. Because the selected targets are highly conserved within each genotype, we believe that they are somehow necessary for the virus life cycle and the accumulation of mutations due to resistance to treatment would probably not influence these specific sites, although this point should be further studied and verified. If the DNA of HBV can be amplified from the patient serum (such as patients with persistent viremia or virological breakthrough), regardless of past treatment, SEB Genotyping can be used. To our knowledge, current treatments or those in development (nucleot/ side analogs, siRNA, various inhibitors of the viral cycle or immunomodulators) [39] do not directly induce mutations in the HBV genome and should not have any effect on the detection and genotyping of the virus by SEB Genotyping. A hypothetical therapy using CRISPR or other similar technology to insert a mutation specifically in the regions targeted in our manuscript would prevent SEB Genotyping from being used in its current form, but it would be possible to design new primers to target another region to solve the issue. From our point of view, only the viral load and detection limit would determine whether or not SEB Genotyping can be used to genotype patients for prediction of treatment resistance.

Conclusion
We developed a new usage for SmartAmp-Eprimer, making use of the strongly sequence-specific binding of Eprimers to their targets to create an On/Off genotyping tool specifically designed for highly variable targets.
Regarding the application of SEB Genotyping to the detection of HBV genotypes, while the human serum sample testing results shown here permits us to foresee satisfying genotyping results with our new method, they are only a proof of concept and further testing on a larger number of samples as well as human sera with other HBV genotypes will be necessary for a better representation of the sensibility and accuracy of our assay in clinical settings. Future development of SEB Genotyping would include the expansion to other targets to detect between various bacterial or viral infections, for example for the fast, onsite distinction between the crucial variants of concern of the SARS-CoV-2 virus [40][41][42] in COVID-19 patients; signature mutations linked to cancer development and prognosis or any other highly variable part of a genome of interest. SEB Genotyping was optimized so that all 4 primer sets run under the same conditions, reducing the technical hurdles, and could easily be adapted into a microfluidics chip.